Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 2790 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 283.5 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 13 |
|---|
gross_revenue is highly correlated with qt_items and 3 other fields | High correlation |
qt_items is highly correlated with gross_revenue and 3 other fields | High correlation |
qt_invoice is highly correlated with gross_revenue and 2 other fields | High correlation |
qt_products is highly correlated with gross_revenue and 3 other fields | High correlation |
avg_ticket is highly correlated with avg_unique_basket_size | High correlation |
avg_unique_basket_size is highly correlated with qt_products and 1 other fields | High correlation |
avg_basket_size is highly correlated with gross_revenue and 1 other fields | High correlation |
gross_revenue is highly correlated with qt_items and 1 other fields | High correlation |
qt_items is highly correlated with gross_revenue and 1 other fields | High correlation |
qt_invoice is highly correlated with gross_revenue and 2 other fields | High correlation |
qt_products is highly correlated with qt_invoice and 1 other fields | High correlation |
avg_ticket is highly correlated with qt_returns and 1 other fields | High correlation |
qt_returns is highly correlated with avg_ticket and 1 other fields | High correlation |
avg_unique_basket_size is highly correlated with qt_products | High correlation |
avg_basket_size is highly correlated with avg_ticket and 1 other fields | High correlation |
gross_revenue is highly correlated with qt_items and 1 other fields | High correlation |
qt_items is highly correlated with gross_revenue and 2 other fields | High correlation |
qt_invoice is highly correlated with gross_revenue and 1 other fields | High correlation |
qt_products is highly correlated with avg_unique_basket_size | High correlation |
avg_unique_basket_size is highly correlated with qt_products | High correlation |
avg_basket_size is highly correlated with qt_items | High correlation |
gross_revenue is highly correlated with qt_items and 5 other fields | High correlation |
qt_items is highly correlated with gross_revenue and 5 other fields | High correlation |
qt_invoice is highly correlated with gross_revenue and 2 other fields | High correlation |
qt_products is highly correlated with gross_revenue and 3 other fields | High correlation |
avg_ticket is highly correlated with gross_revenue and 3 other fields | High correlation |
qt_returns is highly correlated with gross_revenue and 3 other fields | High correlation |
avg_unique_basket_size is highly correlated with qt_products | High correlation |
avg_basket_size is highly correlated with gross_revenue and 3 other fields | High correlation |
avg_ticket is highly skewed (γ1 = 51.95952231) | Skewed |
frequency is highly skewed (γ1 = 47.28301617) | Skewed |
qt_returns is highly skewed (γ1 = 49.28702505) | Skewed |
avg_basket_size is highly skewed (γ1 = 45.04601647) | Skewed |
df_index has unique values | Unique |
customer_id has unique values | Unique |
recency_days has 34 (1.2%) zeros | Zeros |
qt_returns has 1456 (52.2%) zeros | Zeros |
Reproduction
| Analysis started | 2022-03-26 14:59:45.818052 |
|---|---|
| Analysis finished | 2022-03-26 15:00:38.115295 |
| Duration | 52.3 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 2790 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2309.910753 |
| Minimum | 0 |
|---|---|
| Maximum | 5887 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 182.45 |
| Q1 | 920.25 |
| median | 2100 |
| Q3 | 3498.75 |
| 95-th percentile | 5127.55 |
| Maximum | 5887 |
| Range | 5887 |
| Interquartile range (IQR) | 2578.5 |
Descriptive statistics
| Standard deviation | 1577.346218 |
|---|---|
| Coefficient of variation (CV) | 0.6828602429 |
| Kurtosis | -0.9354811747 |
| Mean | 2309.910753 |
| Median Absolute Deviation (MAD) | 1267 |
| Skewness | 0.3979437407 |
| Sum | 6444651 |
| Variance | 2488021.09 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 2976 | 1 | < 0.1% |
| 2963 | 1 | < 0.1% |
| 2964 | 1 | < 0.1% |
| 2965 | 1 | < 0.1% |
| 2966 | 1 | < 0.1% |
| 2968 | 1 | < 0.1% |
| 2969 | 1 | < 0.1% |
| 2973 | 1 | < 0.1% |
| 2974 | 1 | < 0.1% |
| Other values (2780) | 2780 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 5887 | 1 | |
| 5877 | 1 | |
| 5871 | 1 | |
| 5846 | 1 | |
| 5840 | 1 | |
| 5829 | 1 | |
| 5828 | 1 | |
| 5811 | 1 | |
| 5810 | 1 | |
| 5808 | 1 |
| Distinct | 2790 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15281.86846 |
| Minimum | 12347 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 12347 |
|---|---|
| 5-th percentile | 12620.45 |
| Q1 | 13812.25 |
| median | 15240.5 |
| Q3 | 16778.75 |
| 95-th percentile | 17950.55 |
| Maximum | 18287 |
| Range | 5940 |
| Interquartile range (IQR) | 2966.5 |
Descriptive statistics
| Standard deviation | 1717.476402 |
|---|---|
| Coefficient of variation (CV) | 0.1123865453 |
| Kurtosis | -1.207312643 |
| Mean | 15281.86846 |
| Median Absolute Deviation (MAD) | 1485 |
| Skewness | 0.01466970702 |
| Sum | 42636413 |
| Variance | 2949725.19 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17850 | 1 | < 0.1% |
| 16806 | 1 | < 0.1% |
| 16249 | 1 | < 0.1% |
| 14198 | 1 | < 0.1% |
| 13989 | 1 | < 0.1% |
| 17930 | 1 | < 0.1% |
| 14482 | 1 | < 0.1% |
| 14163 | 1 | < 0.1% |
| 13811 | 1 | < 0.1% |
| 12457 | 1 | < 0.1% |
| Other values (2780) | 2780 |
| Value | Count | Frequency (%) |
| 12347 | 1 | |
| 12348 | 1 | |
| 12352 | 1 | |
| 12356 | 1 | |
| 12358 | 1 | |
| 12359 | 1 | |
| 12360 | 1 | |
| 12362 | 1 | |
| 12363 | 1 | |
| 12364 | 1 |
| Value | Count | Frequency (%) |
| 18287 | 1 | |
| 18283 | 1 | |
| 18282 | 1 | |
| 18273 | 1 | |
| 18272 | 1 | |
| 18270 | 1 | |
| 18265 | 1 | |
| 18263 | 1 | |
| 18261 | 1 | |
| 18260 | 1 |
| Distinct | 2773 |
|---|---|
| Distinct (%) | 99.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2941.458523 |
| Minimum | 6.9 |
|---|---|
| Maximum | 280206.02 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 6.9 |
|---|---|
| 5-th percentile | 263.363 |
| Q1 | 628.9125 |
| median | 1192.93 |
| Q3 | 2474.305 |
| 95-th percentile | 7813.0015 |
| Maximum | 280206.02 |
| Range | 280199.12 |
| Interquartile range (IQR) | 1845.3925 |
Descriptive statistics
| Standard deviation | 10981.33843 |
|---|---|
| Coefficient of variation (CV) | 3.733297052 |
| Kurtosis | 327.0532004 |
| Mean | 2941.458523 |
| Median Absolute Deviation (MAD) | 703.095 |
| Skewness | 16.11726891 |
| Sum | 8206669.28 |
| Variance | 120589793.8 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 889.93 | 2 | 0.1% |
| 1353.74 | 2 | 0.1% |
| 1418.03 | 2 | 0.1% |
| 745.06 | 2 | 0.1% |
| 2092.32 | 2 | 0.1% |
| 178.96 | 2 | 0.1% |
| 618.09 | 2 | 0.1% |
| 734.94 | 2 | 0.1% |
| 379.65 | 2 | 0.1% |
| 1314.45 | 2 | 0.1% |
| Other values (2763) | 2770 |
| Value | Count | Frequency (%) |
| 6.9 | 1 | |
| 36.56 | 1 | |
| 52 | 1 | |
| 52.2 | 1 | |
| 62.43 | 1 | |
| 68.84 | 1 | |
| 70.02 | 1 | |
| 77.4 | 1 | |
| 84.65 | 1 | |
| 90.3 | 1 |
| Value | Count | Frequency (%) |
| 280206.02 | 1 | |
| 259657.3 | 1 | |
| 194550.79 | 1 | |
| 168472.5 | 1 | |
| 143825.06 | 1 | |
| 124914.53 | 1 | |
| 117379.63 | 1 | |
| 91062.38 | 1 | |
| 81024.84 | 1 | |
| 66653.56 | 1 |
| Distinct | 251 |
|---|---|
| Distinct (%) | 9.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 56.7921147 |
| Minimum | 0 |
|---|---|
| Maximum | 372 |
| Zeros | 34 |
| Zeros (%) | 1.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 10 |
| median | 29 |
| Q3 | 73 |
| 95-th percentile | 211 |
| Maximum | 372 |
| Range | 372 |
| Interquartile range (IQR) | 63 |
Descriptive statistics
| Standard deviation | 68.2991265 |
|---|---|
| Coefficient of variation (CV) | 1.202616364 |
| Kurtosis | 3.350490882 |
| Mean | 56.7921147 |
| Median Absolute Deviation (MAD) | 23.5 |
| Skewness | 1.881292285 |
| Sum | 158450 |
| Variance | 4664.77068 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 99 | 3.5% |
| 4 | 87 | 3.1% |
| 2 | 85 | 3.0% |
| 3 | 85 | 3.0% |
| 8 | 76 | 2.7% |
| 10 | 67 | 2.4% |
| 9 | 67 | 2.4% |
| 7 | 65 | 2.3% |
| 17 | 62 | 2.2% |
| 22 | 55 | 2.0% |
| Other values (241) | 2042 |
| Value | Count | Frequency (%) |
| 0 | 34 | 1.2% |
| 1 | 99 | |
| 2 | 85 | |
| 3 | 85 | |
| 4 | 87 | |
| 5 | 43 | |
| 7 | 65 | |
| 8 | 76 | |
| 9 | 67 | |
| 10 | 67 |
| Value | Count | Frequency (%) |
| 372 | 1 | < 0.1% |
| 366 | 1 | < 0.1% |
| 360 | 1 | < 0.1% |
| 358 | 3 | |
| 337 | 1 | < 0.1% |
| 336 | 2 | |
| 334 | 1 | < 0.1% |
| 333 | 2 | |
| 330 | 1 | < 0.1% |
| 326 | 1 | < 0.1% |
| Distinct | 1642 |
|---|---|
| Distinct (%) | 58.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1698.045878 |
| Minimum | 2 |
|---|---|
| Maximum | 196915 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 117.45 |
| Q1 | 330 |
| median | 700.5 |
| Q3 | 1478.75 |
| 95-th percentile | 4635.75 |
| Maximum | 196915 |
| Range | 196913 |
| Interquartile range (IQR) | 1148.75 |
Descriptive statistics
| Standard deviation | 6068.875091 |
|---|---|
| Coefficient of variation (CV) | 3.574034818 |
| Kurtosis | 438.7014545 |
| Mean | 1698.045878 |
| Median Absolute Deviation (MAD) | 450 |
| Skewness | 17.32924852 |
| Sum | 4737548 |
| Variance | 36831244.88 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 300 | 10 | 0.4% |
| 310 | 10 | 0.4% |
| 150 | 8 | 0.3% |
| 394 | 8 | 0.3% |
| 88 | 8 | 0.3% |
| 219 | 7 | 0.3% |
| 246 | 7 | 0.3% |
| 306 | 7 | 0.3% |
| 272 | 7 | 0.3% |
| 493 | 7 | 0.3% |
| Other values (1632) | 2711 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 16 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 20 | 1 | |
| 25 | 1 | |
| 27 | 2 | |
| 30 | 1 | |
| 32 | 1 |
| Value | Count | Frequency (%) |
| 196915 | 1 | |
| 80997 | 1 | |
| 80265 | 1 | |
| 77374 | 1 | |
| 69993 | 1 | |
| 64549 | 1 | |
| 64124 | 1 | |
| 63312 | 1 | |
| 58343 | 1 | |
| 57885 | 1 |
| Distinct | 58 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.065232975 |
| Minimum | 2 |
|---|---|
| Maximum | 209 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 2 |
| median | 4 |
| Q3 | 7 |
| 95-th percentile | 17 |
| Maximum | 209 |
| Range | 207 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 9.116357263 |
|---|---|
| Coefficient of variation (CV) | 1.503051457 |
| Kurtosis | 187.7369859 |
| Mean | 6.065232975 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 10.72802592 |
| Sum | 16922 |
| Variance | 83.10796974 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 785 | |
| 3 | 505 | |
| 4 | 386 | |
| 5 | 242 | 8.7% |
| 6 | 172 | 6.2% |
| 7 | 143 | 5.1% |
| 8 | 98 | 3.5% |
| 9 | 68 | 2.4% |
| 10 | 54 | 1.9% |
| 11 | 52 | 1.9% |
| Other values (48) | 285 | 10.2% |
| Value | Count | Frequency (%) |
| 2 | 785 | |
| 3 | 505 | |
| 4 | 386 | |
| 5 | 242 | 8.7% |
| 6 | 172 | 6.2% |
| 7 | 143 | 5.1% |
| 8 | 98 | 3.5% |
| 9 | 68 | 2.4% |
| 10 | 54 | 1.9% |
| 11 | 52 | 1.9% |
| Value | Count | Frequency (%) |
| 209 | 1 | |
| 201 | 1 | |
| 124 | 1 | |
| 97 | 1 | |
| 93 | 1 | |
| 91 | 1 | |
| 86 | 1 | |
| 73 | 1 | |
| 63 | 1 | |
| 62 | 1 |
| Distinct | 341 |
|---|---|
| Distinct (%) | 12.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 83.07168459 |
| Minimum | 1 |
|---|---|
| Maximum | 1786 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 29 |
| median | 56 |
| Q3 | 105 |
| 95-th percentile | 239.55 |
| Maximum | 1786 |
| Range | 1785 |
| Interquartile range (IQR) | 76 |
Descriptive statistics
| Standard deviation | 98.60104224 |
|---|---|
| Coefficient of variation (CV) | 1.186939241 |
| Kurtosis | 80.64019576 |
| Mean | 83.07168459 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 6.347933357 |
| Sum | 231770 |
| Variance | 9722.16553 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 37 | 39 | 1.4% |
| 24 | 37 | 1.3% |
| 26 | 36 | 1.3% |
| 25 | 35 | 1.3% |
| 33 | 35 | 1.3% |
| 28 | 34 | 1.2% |
| 18 | 32 | 1.1% |
| 30 | 32 | 1.1% |
| 15 | 30 | 1.1% |
| 23 | 30 | 1.1% |
| Other values (331) | 2450 |
| Value | Count | Frequency (%) |
| 1 | 21 | |
| 2 | 13 | |
| 3 | 18 | |
| 4 | 18 | |
| 5 | 23 | |
| 6 | 19 | |
| 7 | 22 | |
| 8 | 24 | |
| 9 | 23 | |
| 10 | 20 |
| Value | Count | Frequency (%) |
| 1786 | 1 | |
| 1766 | 1 | |
| 1322 | 1 | |
| 1118 | 1 | |
| 884 | 1 | |
| 817 | 1 | |
| 717 | 1 | |
| 714 | 1 | |
| 699 | 1 | |
| 636 | 1 |
| Distinct | 2788 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52.9729736 |
| Minimum | 2.150588235 |
|---|---|
| Maximum | 56157.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 2.150588235 |
|---|---|
| 5-th percentile | 4.851708778 |
| Q1 | 12.51992731 |
| median | 18.1679386 |
| Q3 | 25.34082943 |
| 95-th percentile | 89.01555123 |
| Maximum | 56157.5 |
| Range | 56155.34941 |
| Interquartile range (IQR) | 12.82090212 |
Descriptive statistics
| Standard deviation | 1068.595594 |
|---|---|
| Coefficient of variation (CV) | 20.17246762 |
| Kurtosis | 2727.49197 |
| Mean | 52.9729736 |
| Median Absolute Deviation (MAD) | 6.418897485 |
| Skewness | 51.95952231 |
| Sum | 147794.5963 |
| Variance | 1141896.544 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 14.47833333 | 2 | 0.1% |
| 4.162 | 2 | 0.1% |
| 18.15222222 | 1 | < 0.1% |
| 38.1166129 | 1 | < 0.1% |
| 26.08797101 | 1 | < 0.1% |
| 17.98461538 | 1 | < 0.1% |
| 30.88 | 1 | < 0.1% |
| 44.62769231 | 1 | < 0.1% |
| 14.36215278 | 1 | < 0.1% |
| 47.03673077 | 1 | < 0.1% |
| Other values (2778) | 2778 |
| Value | Count | Frequency (%) |
| 2.150588235 | 1 | |
| 2.4325 | 1 | |
| 2.462371134 | 1 | |
| 2.504876033 | 1 | |
| 2.50837156 | 1 | |
| 2.65 | 1 | |
| 2.656931818 | 1 | |
| 2.707598253 | 1 | |
| 2.760621572 | 1 | |
| 2.771005291 | 1 |
| Value | Count | Frequency (%) |
| 56157.5 | 1 | |
| 4453.43 | 1 | |
| 2027.86 | 1 | |
| 1687.2 | 1 | |
| 952.9875 | 1 | |
| 872.13 | 1 | |
| 835.864 | 1 | |
| 643.8585714 | 1 | |
| 640 | 1 | |
| 624.4 | 1 |
avg_recency_days
Real number (ℝ≥0)
| Distinct | 45 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.020807817 |
| Minimum | 1 |
|---|---|
| Maximum | 3 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1.081833333 |
| Maximum | 3 |
| Range | 2 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.1172111288 |
|---|---|
| Coefficient of variation (CV) | 0.1148219349 |
| Kurtosis | 93.59539097 |
| Mean | 1.020807817 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.538125149 |
| Sum | 2848.05381 |
| Variance | 0.01373844871 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=45)
| Value | Count | Frequency (%) |
| 1 | 2616 | |
| 2 | 19 | 0.7% |
| 1.5 | 18 | 0.6% |
| 1.2 | 15 | 0.5% |
| 1.25 | 15 | 0.5% |
| 1.333333333 | 14 | 0.5% |
| 1.166666667 | 14 | 0.5% |
| 1.142857143 | 10 | 0.4% |
| 1.666666667 | 5 | 0.2% |
| 1.071428571 | 4 | 0.1% |
| Other values (35) | 60 | 2.2% |
| Value | Count | Frequency (%) |
| 1 | 2616 | |
| 1.021276596 | 1 | < 0.1% |
| 1.027027027 | 1 | < 0.1% |
| 1.028571429 | 1 | < 0.1% |
| 1.030534351 | 1 | < 0.1% |
| 1.030769231 | 1 | < 0.1% |
| 1.035714286 | 2 | 0.1% |
| 1.038461538 | 1 | < 0.1% |
| 1.042857143 | 1 | < 0.1% |
| 1.043478261 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 3 | 2 | 0.1% |
| 2 | 19 | |
| 1.823529412 | 1 | < 0.1% |
| 1.666666667 | 5 | 0.2% |
| 1.5 | 18 | |
| 1.48 | 1 | < 0.1% |
| 1.416666667 | 1 | < 0.1% |
| 1.4 | 3 | 0.1% |
| 1.375 | 1 | < 0.1% |
| 1.333333333 | 14 |
| Distinct | 1235 |
|---|---|
| Distinct (%) | 44.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.06310623622 |
| Minimum | 0.005464480874 |
|---|---|
| Maximum | 34 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 0.005464480874 |
|---|---|
| 5-th percentile | 0.008771929825 |
| Q1 | 0.01587301587 |
| median | 0.02469135802 |
| Q3 | 0.04255319149 |
| 95-th percentile | 0.1212022367 |
| Maximum | 34 |
| Range | 33.99453552 |
| Interquartile range (IQR) | 0.02668017562 |
Descriptive statistics
| Standard deviation | 0.66878999 |
|---|---|
| Coefficient of variation (CV) | 10.5978431 |
| Kurtosis | 2382.264166 |
| Mean | 0.06310623622 |
| Median Absolute Deviation (MAD) | 0.01099272789 |
| Skewness | 47.28301617 |
| Sum | 176.0663991 |
| Variance | 0.4472800507 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.07142857143 | 15 | 0.5% |
| 0.04761904762 | 14 | 0.5% |
| 0.02857142857 | 14 | 0.5% |
| 0.0303030303 | 14 | 0.5% |
| 0.01587301587 | 14 | 0.5% |
| 0.02380952381 | 13 | 0.5% |
| 0.06451612903 | 13 | 0.5% |
| 0.025 | 12 | 0.4% |
| 0.1176470588 | 12 | 0.4% |
| 0.03846153846 | 12 | 0.4% |
| Other values (1225) | 2657 |
| Value | Count | Frequency (%) |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005509641873 | 1 | < 0.1% |
| 0.005602240896 | 2 | |
| 0.005617977528 | 1 | < 0.1% |
| 0.005633802817 | 2 | |
| 0.005681818182 | 1 | < 0.1% |
| 0.005698005698 | 2 | |
| 0.005714285714 | 3 |
| Value | Count | Frequency (%) |
| 34 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 2 | 7 | |
| 1.5 | 1 | < 0.1% |
| 1.333333333 | 2 | 0.1% |
| 1 | 5 | |
| 0.6666666667 | 3 | |
| 0.5603217158 | 1 | < 0.1% |
| 0.5403225806 | 1 | < 0.1% |
| Distinct | 208 |
|---|---|
| Distinct (%) | 7.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 68.68960573 |
| Minimum | 0 |
|---|---|
| Maximum | 80995 |
| Zeros | 1456 |
| Zeros (%) | 52.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 9 |
| 95-th percentile | 102.1 |
| Maximum | 80995 |
| Range | 80995 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 1570.684029 |
|---|---|
| Coefficient of variation (CV) | 22.86640041 |
| Kurtosis | 2530.258092 |
| Mean | 68.68960573 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 49.28702505 |
| Sum | 191644 |
| Variance | 2467048.319 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1456 | |
| 1 | 145 | 5.2% |
| 2 | 121 | 4.3% |
| 3 | 84 | 3.0% |
| 4 | 73 | 2.6% |
| 5 | 58 | 2.1% |
| 6 | 58 | 2.1% |
| 8 | 45 | 1.6% |
| 12 | 42 | 1.5% |
| 7 | 41 | 1.5% |
| Other values (198) | 667 |
| Value | Count | Frequency (%) |
| 0 | 1456 | |
| 1 | 145 | 5.2% |
| 2 | 121 | 4.3% |
| 3 | 84 | 3.0% |
| 4 | 73 | 2.6% |
| 5 | 58 | 2.1% |
| 6 | 58 | 2.1% |
| 7 | 41 | 1.5% |
| 8 | 45 | 1.6% |
| 9 | 35 | 1.3% |
| Value | Count | Frequency (%) |
| 80995 | 1 | |
| 9361 | 1 | |
| 9014 | 1 | |
| 8060 | 1 | |
| 4627 | 1 | |
| 3768 | 1 | |
| 3335 | 1 | |
| 2975 | 1 | |
| 2160 | 1 | |
| 2022 | 1 |
avg_unique_basket_size
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1002 |
|---|---|
| Distinct (%) | 35.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 21.9785979 |
| Minimum | 0.5 |
|---|---|
| Maximum | 299.7058824 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 0.5 |
|---|---|
| 5-th percentile | 3.333333333 |
| Q1 | 10 |
| median | 17.21111111 |
| Q3 | 27.9375 |
| 95-th percentile | 56.27333333 |
| Maximum | 299.7058824 |
| Range | 299.2058824 |
| Interquartile range (IQR) | 17.9375 |
Descriptive statistics
| Standard deviation | 18.76755425 |
|---|---|
| Coefficient of variation (CV) | 0.8539013421 |
| Kurtosis | 24.64773574 |
| Mean | 21.9785979 |
| Median Absolute Deviation (MAD) | 8.211111111 |
| Skewness | 3.189408317 |
| Sum | 61320.28815 |
| Variance | 352.2210925 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 46 | 1.6% |
| 14 | 30 | 1.1% |
| 11 | 29 | 1.0% |
| 9 | 28 | 1.0% |
| 7.5 | 25 | 0.9% |
| 1 | 25 | 0.9% |
| 10.5 | 25 | 0.9% |
| 9.5 | 25 | 0.9% |
| 17.5 | 24 | 0.9% |
| 15.5 | 24 | 0.9% |
| Other values (992) | 2509 |
| Value | Count | Frequency (%) |
| 0.5 | 2 | 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 1 | 25 | |
| 1.2 | 1 | < 0.1% |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 2 | 0.1% |
| 1.5 | 8 | 0.3% |
| 1.533333333 | 1 | < 0.1% |
| 1.571428571 | 1 | < 0.1% |
| 1.666666667 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 299.7058824 | 1 | |
| 203.5 | 1 | |
| 145 | 1 | |
| 136.125 | 1 | |
| 135.5 | 1 | |
| 122 | 1 | |
| 118 | 1 | |
| 114 | 1 | |
| 110.3333333 | 1 | |
| 110 | 1 |
avg_basket_size
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 1937 |
|---|---|
| Distinct (%) | 69.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 244.8489954 |
| Minimum | 1 |
|---|---|
| Maximum | 40498.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 44.575 |
| Q1 | 103.225 |
| median | 171.8571429 |
| Q3 | 277.28125 |
| 95-th percentile | 590.275 |
| Maximum | 40498.5 |
| Range | 40497.5 |
| Interquartile range (IQR) | 174.05625 |
Descriptive statistics
| Standard deviation | 805.4131065 |
|---|---|
| Coefficient of variation (CV) | 3.289427858 |
| Kurtosis | 2240.24006 |
| Mean | 244.8489954 |
| Median Absolute Deviation (MAD) | 81.14285714 |
| Skewness | 45.04601647 |
| Sum | 683128.6973 |
| Variance | 648690.2721 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 11 | 0.4% |
| 86 | 8 | 0.3% |
| 82 | 8 | 0.3% |
| 60 | 8 | 0.3% |
| 197 | 7 | 0.3% |
| 64 | 7 | 0.3% |
| 75 | 7 | 0.3% |
| 143.5 | 7 | 0.3% |
| 44 | 7 | 0.3% |
| 153 | 6 | 0.2% |
| Other values (1927) | 2714 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 1.5 | 1 | |
| 3.333333333 | 1 | |
| 5.333333333 | 1 | |
| 5.666666667 | 1 | |
| 6.142857143 | 1 | |
| 7.5 | 1 | |
| 9 | 1 | |
| 9.5 | 1 | |
| 11 | 1 |
| Value | Count | Frequency (%) |
| 40498.5 | 1 | |
| 6009.333333 | 1 | |
| 3684.47619 | 1 | |
| 2880 | 1 | |
| 2697.465753 | 1 | |
| 2183.2 | 1 | |
| 2160.333333 | 1 | |
| 2082.225806 | 1 | |
| 2000 | 1 | |
| 1953.5 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | customer_id | gross_revenue | recency_days | qt_items | qt_invoice | qt_products | avg_ticket | avg_recency_days | frequency | qt_returns | avg_unique_basket_size | avg_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 17850 | 5391.2100 | 372.0000 | 1733.0000 | 34.0000 | 21.0000 | 18.1522 | 1.0000 | 34.0000 | 40.0000 | 8.7353 | 50.9706 |
| 1 | 1 | 13047 | 3237.5400 | 31.0000 | 1391.0000 | 10.0000 | 105.0000 | 18.8229 | 1.0000 | 0.0292 | 36.0000 | 17.1000 | 139.1000 |
| 2 | 2 | 12583 | 7281.3800 | 2.0000 | 5060.0000 | 15.0000 | 114.0000 | 29.4793 | 1.0000 | 0.0404 | 51.0000 | 15.4667 | 337.3333 |
| 3 | 3 | 13748 | 948.2500 | 95.0000 | 439.0000 | 5.0000 | 24.0000 | 33.8661 | 1.0000 | 0.0180 | 0.0000 | 5.6000 | 87.8000 |
| 4 | 4 | 15100 | 876.0000 | 333.0000 | 80.0000 | 3.0000 | 1.0000 | 292.0000 | 1.0000 | 0.0750 | 22.0000 | 1.0000 | 26.6667 |
| 5 | 5 | 15291 | 4668.3000 | 25.0000 | 2103.0000 | 15.0000 | 61.0000 | 45.3233 | 1.0000 | 0.0431 | 29.0000 | 6.8000 | 140.2000 |
| 6 | 6 | 14688 | 5630.8700 | 7.0000 | 3621.0000 | 21.0000 | 148.0000 | 17.2198 | 1.0526 | 0.0574 | 399.0000 | 15.5714 | 172.4286 |
| 7 | 7 | 17809 | 5411.9100 | 16.0000 | 2057.0000 | 12.0000 | 46.0000 | 88.7198 | 1.0000 | 0.0336 | 42.0000 | 5.0833 | 171.4167 |
| 8 | 8 | 15311 | 60767.9000 | 0.0000 | 38194.0000 | 91.0000 | 567.0000 | 25.5435 | 1.0000 | 0.2440 | 474.0000 | 26.1429 | 419.7143 |
| 9 | 9 | 14527 | 8508.8200 | 2.0000 | 2089.0000 | 55.0000 | 329.0000 | 8.7539 | 1.0000 | 0.1499 | 40.0000 | 17.6545 | 37.9818 |
Last rows
| df_index | customer_id | gross_revenue | recency_days | qt_items | qt_invoice | qt_products | avg_ticket | avg_recency_days | frequency | qt_returns | avg_unique_basket_size | avg_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2780 | 5808 | 12784 | 574.4200 | 9.0000 | 300.0000 | 2.0000 | 53.0000 | 9.7359 | 1.0000 | 0.3333 | 0.0000 | 29.0000 | 150.0000 |
| 2781 | 5810 | 14785 | 77.4000 | 10.0000 | 84.0000 | 2.0000 | 2.0000 | 25.8000 | 1.0000 | 0.4000 | 0.0000 | 1.5000 | 42.0000 |
| 2782 | 5811 | 17254 | 272.4400 | 4.0000 | 252.0000 | 2.0000 | 100.0000 | 2.4325 | 1.0000 | 0.1818 | 0.0000 | 56.0000 | 126.0000 |
| 2783 | 5828 | 17232 | 421.5200 | 2.0000 | 203.0000 | 2.0000 | 30.0000 | 11.7089 | 1.0000 | 0.1667 | 0.0000 | 18.0000 | 101.5000 |
| 2784 | 5829 | 17468 | 137.0000 | 10.0000 | 116.0000 | 2.0000 | 5.0000 | 27.4000 | 1.0000 | 0.5000 | 0.0000 | 2.5000 | 58.0000 |
| 2785 | 5840 | 13596 | 697.0400 | 5.0000 | 406.0000 | 2.0000 | 133.0000 | 4.1990 | 1.0000 | 0.2857 | 0.0000 | 83.0000 | 203.0000 |
| 2786 | 5846 | 14893 | 1237.8500 | 9.0000 | 799.0000 | 2.0000 | 72.0000 | 16.9568 | 1.0000 | 1.0000 | 0.0000 | 36.5000 | 399.5000 |
| 2787 | 5871 | 14126 | 706.1300 | 7.0000 | 508.0000 | 3.0000 | 14.0000 | 47.0753 | 1.0000 | 1.0000 | 50.0000 | 5.0000 | 169.3333 |
| 2788 | 5877 | 13521 | 1093.6500 | 1.0000 | 736.0000 | 3.0000 | 312.0000 | 2.5084 | 1.0000 | 0.3333 | 0.0000 | 145.0000 | 245.3333 |
| 2789 | 5887 | 15060 | 303.0900 | 8.0000 | 263.0000 | 4.0000 | 80.0000 | 2.5049 | 1.0000 | 4.0000 | 0.0000 | 30.0000 | 65.7500 |